High-Performance Linear Algebra Processor using FPGA

نویسندگان

  • J. R. Johnson
  • P. Nagvajara
  • C. Nwankpa
چکیده

With recent advances in FPGA (Field Programmable Gate Array) technology it is now feasible to use these devices to build special purpose processors for floating point intensive applications that arise in scientific computing. FPGA provides programmable hardware that can be used to design custom hardware without the high-cost of traditional hardware design. In this talk we discuss two multi-processor designs using FPGA for basic linear algebra computations such as matrix multiplication and LU factorization. The first design is a purely hardware solution for dense matrix computations, and the second design uses a hardware/software solution for sparse matrix computations. The hardware solution uses the regular structure available in dense linear algebra computations to design custom processors with hard-wired communication patterns. The hardware/software solution uses embedded processors with the flexibility to program the irregular communication patterns required by sparse matrix computations. The dense matrix processor utilizes a distributed memory architecture connected in a ring topology, with hardwired control for communication. Each processing element consists of pipelined multiply-accumulate hardware, and local memory to store part of the input and output matrices. ∗This work was partially supported by DOE grant #ER63384, PowerGrid A Computation Engine for Large-Scale Electric Networks †Department of Computer Science, Drexel University, Philadelphia, PA 19104. email:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Mixed Precision Floating Point Hardware in Scientific Computations

By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to exotic technologies such as Field Programmable Gate Arrays (FPGA), Graphical P...

متن کامل

Design and Implementation of Field Programmable Gate Array Based Baseband Processor for Passive Radio Frequency Identification Tag (TECHNICAL NOTE)

In this paper, an Ultra High Frequency (UHF) base band processor for a passive tag is presented. It proposes a Radio Frequency Identification (RFID) tag digital base band architecture which is compatible with the EPC C C2/ISO18000-6B protocol. Several design approaches such as clock gating technique, clock strobe design and clock management are used. In order to reduce the area Decimal Matrix C...

متن کامل

Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems

By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to exotic technologies such as Field Programmable Gate Arrays (FPGA), Graphical P...

متن کامل

Design and Implementation of Digital Demodulator for Frequency Modulated CW Radar (RESEARCH NOTE)

Radar Signal Processing has been an interesting area of research for realization of programmable digital signal processor using VLSI design techniques. Digital Signal Processing (DSP) algorithms have been an integral design methodology for implementation of high speed application specific real-time systems especially for high resolution radar. CORDIC algorithm, in recent times, is turned out to...

متن کامل

Trading Off Performance for Energy in Linear Algebra Operations with Applications in Control Theory

We analyze the performance-power-energy balance of a conventional Intel Xeon multicore processor and two low-power architectures –an Intel Atom processor and a system with a quad-core ARM Cortex A9+NVIDIA Quadro 1000M– using a high performance implementation of Gauss-Jordan elimination (GJE) for matrix inversion. The blocked version of this algorithm employed in the experimental evaluation most...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003